This notebook is arranged in cells. Texts are usually written in the markdown cells, and here you can use html tags (make it bold, italic, colored, etc). You can double click on this cell to see the formatting.
The ellipsis (...) are provided where you are expected to write your solution but feel free to change the template (not over much) in case this style is not to your taste.
Hit "Shift-Enter" on a code cell to evaluate it. Double click a Markdown cell to edit.
import numpy as np
from scipy.integrate import quad
#For plotting
import matplotlib.pyplot as plt
%matplotlib inline
Mount your Google Drive on your runtime using an authorization code.
Note: When using the 'Mount Drive' button in the file browser, no authentication codes are necessary for notebooks that have only been edited by the current user.
from google.colab import drive
drive.mount('/content/drive')
Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
# Decorations :D
def canvas_ticks(obj):
'''This provides ticks in to a blanck canvas, for singular plots
use plt as the argumenet, for suplots, in for gridspec for expample
insert ax1 as argument'''
obj.minorticks_on()
obj.tick_params(labelsize=14)
obj.tick_params(labelbottom=True, labeltop=False, labelright=False, labelleft=True)
obj.tick_params(direction='in',which='minor', length=5, bottom=True, top=True, left=True, right=True)
obj.tick_params(direction='in',which='major', length=10, bottom=True, top=True, left=True, right=True)
plt.style.use('default')
plt.rcParams['mathtext.fontset'] = 'stix'
plt.rcParams['font.family'] = 'STIXGeneral'
plt.rcParams['font.size'] = 14
error_bar_settings = {
'fmt': 'o',
'ms': 5,
# 'mfc': plot_color,
'ecolor': 'black',
'mec': 'black',
'capsize': 1.5,
'mew': 0.25,
'elinewidth': .5,
# 'alpha': 0.85,
}
Planck is the third-generation space telescope, following COBE and WMAP, and it aims to determine the geometry and content of the Universe by observing the cosmic microwave background radiation (CMB), emitted around 380,000 years after the Big Bang. Permeating the whole universe and containing information on the properties of the early Universe, the CMB is widely known as the strongest evidence for the Big Bang model.
Measuring the spectrum of the CMB, we confirm that it is very close to the radiation from an ideal blackbody, and flunctuations in the spectrum are very small. Averaging ocer all locations, its mean temperature is $2.725K$, and its root mean square temperature fluctuation is $\langle(\frac{\delta T}{T})^2\rangle^{1/2} = 1.1 \times 10^{-5}$ (i.e. the temperature of the CMB varies by only ~ 30 $\mu K$ across the sky).
Suppose you observe the fluctuations $\delta T/T$. Since we are taking measurements on the surface of a sphere, it is useful to expand $\delta T/T$ in spherical harmonics (because they form a complete set of orthogonal functions on the sphere):
$$ \frac{\delta T}{T} (\theta, \phi) = \sum_{l = 0}^{\infty} \sum_{m = -l}^{l} \mathrm{a}_{lm} \mathrm{Y}_{lm} (\theta, \phi) $$
In flat space, we can do a Fourier transform of a function $f(x)$ as $\sum_k \mathrm{a}_k \mathrm{e}^{ikx}$ where $k$ is the wavenumber, and $|\mathrm{a}_k|$ determines the amplitude of the mode. For spherical harmonics, instead of $k$, we have $l$, the number of the modes along a meridian, and $m$, the number of modes along the equator. So $l$ and $m$ determine the wavelength ($\lambda = 2\pi/l$) and shape of the mode, respectively.
In cosmology, we are mostly interested in learning the statistical properties of this map and how different physical effects influence different physical scales, so it is useful to define the correlation function $C(\theta)$ and split the CMB map into different scales.
Suppose that we observe $\delta T/T$ at two different points on the sky. Relative to an observer, they are in direction $\hat{n}$ and $\hat{n}'$ and are separated by an angle $\theta$ given by $cos\theta = \hat{n} \cdot \hat{n}'$ Then, we can find the correlation function by multiplying together the values of $\delta T/T$ at the two points and average the product over all points separated by the angle $\theta$.
$$ C(\theta)^{TT} = \Big\langle \frac{\delta T}{T}(\hat{n})\frac{\delta T}{T}(\hat{n}') \Big\rangle_{\hat{n} \cdot \hat{n}' = cos\theta}$$
The above expression is specific to the temperature fluctuations, but we can also do a similar analysis for the polarization map of the CMB. (The CMB is polarized because it was scattered off of free electrons during decoupling.) We decompose the polarization pattern in the sky into a curl-free "E-mode" and grad-free "B-mode."
However, the CMB measurements (limited by the experiment resolution and the patch of sky examined) tell us about $C(\theta)$ over only a limited range of angular scales. (i.e. the precise values of $C(\theta)$ for all angles from $\theta = 0$ to $\theta = 180^\circ$ is not known.) Hence, using the expansion of $\delta T/T$ in spherical harmonics, we write the correlation function as:
$$ C(\theta) = \frac{1}{4\pi}\sum_{l=0}^\infty (2l+1) C_l P_l(cos\theta) $$
where $P_l$ are the Legendre polynomials.
So we break down the correlation function into its multipole moments $C_l$, which is the angular power spectrum of the CMB.
Remember that $\lambda = 2\pi/l$. So $C_l$ measures the amplitude as a function of wavelength. ($C_l = \frac{1}{2l+1}\sum_{m = -l}^l |\mathrm{a}_{lm}|^2$). In this problem, we will consider the temperature power spectrum $C_l^{TT}$, the E-mode power spectrum $C_l^{EE} = \frac{1}{2l+1}\sum_{m = -l}^l |\mathrm{a}_{lm}^E|^2$, and the temperature-polarization cross-correlation $C_l^{TE} = \frac{1}{2l+1}\sum_{m = -l}^l \mathrm{a}_{lm}^{T*} \mathrm{a}_{lm}^E$.
THe CMB angular power spectrum is usually expressed in terms of $D_l = l(l+1)C_l/2\pi$ (in unit of $\mu K^2$) because this better shows the contribution toward the variance of the temperature fluctuations.
Cosmologists built a software called "cosmological boltzmann code" which computes the theoretical power spectrum given cosmological parameters, such as the Hubble constant and the baryon density. Therefore, we can fit the theory power spectrum to the measured one in order to obtain the best-fit parameters.
Here, we consider six selected cosmological parameters, $\vec{\theta} = [\theta_1, \theta_2, ..., \theta_6] = [H_0, \Omega_b h^2, \Omega_c h^2, n_s, A_s, \tau]$. ($H_0$ = Hubble constant, $\Omega_b h^2$ = physical baryon density parameter, $\Omega_c h^2$ = physical cold dark matter density parameter, $n_s$ = scalar spectral index, $A_s$ = curvature fluctuation amplitude, $\tau$ = reionization optical depth.). We provide you with the measured CMB power spectra from Planck Data Release 2.
References :
http://carina.fcaglp.unlp.edu.ar/extragalactica/Bibliografia/Ryden_IntroCosmo.pdf (Chapter 9, Intro to Cosmology, Barbara Ryden)
In class, we learned that the Fisher information matrix is useful for designing an experiment; we can vary the experiment design and predict the level of the expected error on any given parameter. In this problem, we aim to determine how well a low-noise, high-resolution future CMB survey would do in constraining the cosmological parameters.
The Fisher matrix is defined as the ensemble average of the Hessian of the log-likelihood ($\ln\mathcal{L}$) with respect to the given parameters $\vec{\theta}$:
$$ F_{ij} = -\left\langle\frac{\partial^{2}\ln\mathcal{L}}{\partial\theta_i\ \partial\theta_j}\right\rangle $$
Here we take the model CMB power spectrum as our observables. (Here we consider the auto-correlations $D_l^{TT}, D_l^{EE}$ and cross-correlation $D_l^{TE}$ obtained from the boltzmann code using the best-fit cosmological parameters from Planck, https://arxiv.org/pdf/1502.01589v3.pdf.) Then, we can estimate the Fisher matrix between two parameters $\theta_i$ and $\theta_j$ as:
$$ F_{ij} = \sum_{l} \sum_{k}\frac{1}{(\sigma_l^k)^2}\frac{\partial D^{k}_{\ell}}{\partial\theta_i}\frac{\partial D^{k}_{\ell}}{\partial\theta_j} $$
where we sum over the CMB auto- and cross-power spectra $D_l^k = [D_l^{0}, D_l^{1}, D_l^{2}] = [D_l^{TT}, D_l^{EE}, D_l^{TE}]$, and we assume that there is no correlation between them. $\sigma^2$ is the variance of $D_l$ and noise $N_l$:
$$ (\sigma_l^k)^2 = \frac{2}{(2l+1) \cdot f_{sky} \cdot \Delta l}(D_l^k + N_l^k)^2 $$
$f_{sky}$ is the fraction of the sky covered by the survey. Assume that $f_{sky} = 1$ for the sake of simplicity. $\Delta l$ is the size of $l$-bin. Here, we set $l_{min} = 2, l_{max} = 2000$, and we have 92 $l$-bins in this range (For $2 \leq l < 30$, the power spectra are not binned ($\Delta l = 1$), and for $30 \leq l < 2000$, they are binned, and the bin size is $\Delta l = 30$). We obtain the measured and model power spectrum in that 92 $l$-bins.
In Part 1 and 2, the following is given:
(1) model power spectra ($D_l^{TT}, D_l^{EE}, D_l^{TE}$) evaluated at the best-fit values from Planck (i.e. assuming $\vec{\theta}_{best-fit}$)
(2) ($\frac{\partial D_l^{TT}}{\partial H_0}\Big\vert_{\vec{\theta} = \vec{\theta}_{best-fit}}$, $\frac{\partial D_l^{TT}}{\partial \Omega_bh^2}\Big\vert_{\vec{\theta} = \vec{\theta}_{best-fit}}$, etc): its derivative with respect to the parameter $\theta$ evaluated at the best-fit values from Planck (i.e. assuming $\vec{\theta}_{best-fit}$). These are the measurement errors on $D_l^{TT}, D_l^{EE}, D_l^{TE}$.
(3) the measurement error $\sigma_{l, Planck}^{TT}, \sigma_{l, Planck}^{EE}, \sigma_{l, Planck}^{TE}$ assuming the noise from the current Planck survey
(4) (effective) $l$ values for which the spectra are measured. (i.e. these are the effective bin center values for each $l$ bin.)
In Part 3 and 4, you assume a zero-noise futuristic survey, so you need to compute new measurement error $\sigma_l$ assuming $N_l = 0$.
Finally, we can compute the Fisher matrix $F$ and obtain the covariance matrix $C$ by inverting $F$:
$$ [C] = [F]^{-1} $$
References:
Fisher Matrix Forecasting Review, Nicholas Kern
https://arxiv.org/pdf/0906.4123.pdf
1. First, load the measurement errors ($\sigma_l^{TT}, \sigma_l^{EE}, \sigma_l^{TE}$), model power spectrum ($D_l^{TT}, D_l^{EE}, D_l^{TE}$) and their derivatives with respect to six cosmological parameters evaluated at the best-fit values from Planck ($\frac{\partial D_l^{TT}}{\partial H_0}\Big\vert_{\vec{\theta} = \vec{\theta}_{best-fit}}$, $\frac{\partial D_l^{TT}}{\partial \Omega_bh^2}\Big\vert_{\vec{\theta} = \vec{\theta}_{best-fit}}$, etc). With the measurement errors from Planck, construct the Fisher matrix and the covariance matrix (you can use the numpy.linalg.inv for the matrix inversion). Evaluate the constraints on six parameters $\sigma(H_0), \sigma(\Omega_bh^2), ... , \sigma(\tau)$ (corresponding to the square root of the diagonal entries of the covariance matrix). Print the results.
# Load data
# Best-fit values of the cosmological parameters from https://arxiv.org/pdf/1502.01589v3.pdf
H0 = 67.27
ombh2 = 0.02225
omch2 = 0.1198
ns = 0.9645
As = 2.2065e-9
tau = 0.079
theta_best_Planck = np.array([H0, ombh2, omch2, ns, As, tau])
data = np.loadtxt("/content/drive/My Drive/P188_288/P188_288_HW4/Project1_EE_measured.dat")
# l (same for all power spectrum)
l = data[:,0]
# Planck noise
# sigma_l for D_l^EE assuming Planck N_l
data = np.loadtxt("/content/drive/My Drive/P188_288/P188_288_HW4/Project1_EE_measured.dat")
sigma_l_Planck_EE = data[:,2]
# sigma_l for D_l^TT assuming Planck N_l
data = np.loadtxt("/content/drive/My Drive/P188_288/P188_288_HW4/Project1_TT_measured.dat")
sigma_l_Planck_TT = data[:,2]
# sigma_l for D_l^TE assuming Planck N_l
data = np.loadtxt("/content/drive/My Drive/P188_288/P188_288_HW4/Project1_TE_measured.dat")
sigma_l_Planck_TE = data[:,2]
# Model power spectra given theta_best_Planck (calculated at the same l bins as the measured power spectrum)
# D_l^EE (model)
data = np.loadtxt("/content/drive/My Drive/P188_288/P188_288_HW4/Project1_EE_model_at_theta_best_Planck.dat")
EE_model_Planck = data[:,1]
# D_l^TT (model)
data = np.loadtxt("/content/drive/My Drive/P188_288/P188_288_HW4/Project1_TT_model_at_theta_best_Planck.dat")
TT_model_Planck = data[:,1]
# D_l^TE (model)
data = np.loadtxt("/content/drive/My Drive/P188_288/P188_288_HW4/Project1_TE_model_at_theta_best_Planck.dat")
TE_model_Planck = data[:,1]
# Derivative of the power spectrum given theta_best_Planck (calculated at the same l bins as the measured power spectrum)
# Derivative of D_l^EE with respect to six parameters
# ([theta1, theta2, theta3, theta4, theta5, theta6] = [H_0, \Omega_b h^2, \Omega_c h^2, n_s, A_s, \tau])
data = np.loadtxt("/content/drive/My Drive/P188_288/P188_288_HW4/Project1_derivative_EE_at_theta_best_Planck.dat")
deriv_DlEE_theta1 = data[:,1]
deriv_DlEE_theta2 = data[:,2]
deriv_DlEE_theta3 = data[:,3]
deriv_DlEE_theta4 = data[:,4]
deriv_DlEE_theta5 = data[:,5]
deriv_DlEE_theta6 = data[:,6]
# Derivative of D_l^TT with respect to six parameters
data = np.loadtxt("/content/drive/My Drive/P188_288/P188_288_HW4/Project1_derivative_TT_at_theta_best_Planck.dat")
deriv_DlTT_theta1 = data[:,1]
deriv_DlTT_theta2 = data[:,2]
deriv_DlTT_theta3 = data[:,3]
deriv_DlTT_theta4 = data[:,4]
deriv_DlTT_theta5 = data[:,5]
deriv_DlTT_theta6 = data[:,6]
# Derivative of D_l^TE with respect to six parameters
data = np.loadtxt("/content/drive/My Drive/P188_288/P188_288_HW4/Project1_derivative_TE_at_theta_best_Planck.dat")
deriv_DlTE_theta1 = data[:,1]
deriv_DlTE_theta2 = data[:,2]
deriv_DlTE_theta3 = data[:,3]
deriv_DlTE_theta4 = data[:,4]
deriv_DlTE_theta5 = data[:,5]
deriv_DlTE_theta6 = data[:,6]
$$ F_{ij} = \sum_{l} \sum_{k}\frac{1}{(\sigma_l^k)^2}\frac{\partial D^{k}_{\ell}}{\partial\theta_i}\frac{\partial D^{k}_{\ell}}{\partial\theta_j} $$
sigma_l_Planck_TT.shape
(92,)
sigma = np.array([sigma_l_Planck_TT,sigma_l_Planck_EE, sigma_l_Planck_TE])
model_power_spectrum = np.array([TT_model_Planck,EE_model_Planck,TE_model_Planck])
derivatives_TT = [deriv_DlTT_theta1, deriv_DlTT_theta2,
deriv_DlTT_theta3, deriv_DlTT_theta4,
deriv_DlTT_theta5, deriv_DlTT_theta6
]
derivatives_EE = np.array([deriv_DlEE_theta1, deriv_DlEE_theta2,
deriv_DlEE_theta3, deriv_DlEE_theta4,
deriv_DlEE_theta5, deriv_DlEE_theta6
])
derivatives_TE = np.array([deriv_DlTE_theta1, deriv_DlTE_theta2,
deriv_DlTE_theta3, deriv_DlTE_theta4,
deriv_DlTE_theta5, deriv_DlTE_theta6
])
derivatives = [derivatives_TT, derivatives_EE, derivatives_TE]
n_params = len(derivatives[0])
fisher_matrix = np.zeros((n_params, n_params))
for i in range(len(model_power_spectrum)):
derivs = derivatives[i]
for j in range(n_params):
for k in range(n_params):
fisher_matrix[j, k] += np.sum(derivs[j] * derivs[k] / sigma[i]**2)
Cov = np.linalg.inv(fisher_matrix)
print('1-d constraints:')
print( 'H0 = %.2f +/- %.2f' % (H0, np.sqrt(Cov[0,0])), '\nOmega_b h^2 = %.5f +/- %.5f' % (ombh2, np.sqrt(Cov[1,1])), '\nOmega_c h^2 = %.4f +/- %.4f' % (omch2, np.sqrt(Cov[2,2])), '\nn_s = %.4f +/- %.4f' % (ns, np.sqrt(Cov[3,3])), '\n10^9A_s = %.4f, +/- %.4f' % (As*1e9, np.sqrt(Cov[4,4])*1e9), '\ntau = %.5f +/- %.5f' % (tau, np.sqrt(Cov[5,5])) )
1-d constraints: H0 = 67.27 +/- 0.59 Omega_b h^2 = 0.02225 +/- 0.00014 Omega_c h^2 = 0.1198 +/- 0.0013 n_s = 0.9645 +/- 0.0037 10^9A_s = 2.2065, +/- 0.0494 tau = 0.07900 +/- 0.01156
Now from the covariance matrix, we can plot 1-d and 2-d constraints on the parameters. (See Fig. 6 in Planck 2015 paper https://arxiv.org/pdf/1502.01589v3.pdf)
1-d constraint (corresponding to the plots along the diagonal in Fig. 6, Planck 2015 paper):
First, the $i$th diagonal element of the covariance matrix correspond to $\sigma({\theta_i})^2$. Then, we can plot 1-d constraints on the parameter $\theta_i$ assuming a normal distribution with mean = $(\vec{\theta}_{best-fit})_i$ and variance = $\sigma({\theta_i})^2$.
2-d constraint (off-diagonal plots in Fig. 6, Planck 2015 paper):
Consider two parameters $\theta_i$ and $\theta_j$ from $\vec{\theta}$. Now marginalize over other parameters - in order to marginalize over other parameters, you can simply remove those parameters' row and column from the full covariance matrix. (i.e. From the full covariance matrix, you know the variance of all six parameters and their covariances with each other. So build a smaller dimension - 2 x 2 - covariance matrix from this.) - and obtain a $2\times2$ covariance matrix:
$$ \mathrm{C_{ij}} = \binom{\sigma({\theta_i})^2\ \ \ \ \ \ \mathrm{Cov}({\theta_i, \theta_j})}{\mathrm{Cov}({\theta_i, \theta_j}) \ \ \ \ \ \ \sigma({\theta_j})^2} $$
Now, we can plot the 2-dimensional confidence region ellipses from this matrix. The lengths of the ellipse axes are the square root of the eigenvalues of the covariance matrix, and we can calculate the counter-clockwise rotation of the ellipse with the rotation angle:
$$ \phi = \frac{1}{2} \mathrm{arctan}\Big( \frac{2\cdot \mathrm{Cov}(\theta_i, \theta_j)}{\sigma({\theta_i})^2-\sigma({\theta_j})^2} \Big) = \mathrm{arctan}(\frac{\vec{v_1}(y)}{\vec{v_1}(x)}) $$
where $\vec{v_1}$ is the eigenvector with the largest eigenvalue. So we calculate the angle of the largest eigenvector towards the x-axis to obtain the orientation of the ellipse.
Then, we multiply the axis lengths by some factor depending on the confidence level we are interested in. For 68%, this scale factor is $\sqrt{\Delta \chi^2} \approx 1.52$. For 95%, it is $\sqrt{\Delta \chi^2} \approx 2.48$.
Hint: For plotting ellipses, see HW3 Problem 1 Part 7.
2. From the covariance matrix, plot 1-d and 2-d constraints on the parameters. Note that the best-fit values of six parameters are alrady given in Part 1 (we just use the values from the Planck paper). For 2-d plot, show 68% and 95% confidence ellipses for pairs of parameters. Arrange those subplots in a triangle shape, as in Fig. 6, Planck 2015 (https://arxiv.org/pdf/1502.01589v3.pdf).
from matplotlib.patches import Ellipse
import matplotlib as mpl
# Triangle Plot (Original skeleton code by Nicholas Kern)
# Only a suggestion. You can create your own if you wish.
fig, axes = plt.subplots(6,6,figsize=(12,12))
fig.subplots_adjust(wspace=0, hspace=0)
p_tex = np.array([r'$H_0$', r'$\Omega_bh^2$',r'$\Omega_ch^2$',r'$n_s$',r'$10^9 A_s$',r'$\tau$'])
def gaussian(x, mu, sigma):
return 1 / (sigma * np.sqrt(2 * np.pi)) * np.exp(-(x - mu)**2 / (2 * sigma**2))
for i in range(6):
for j in range(6):
ax = axes[i, j]
if j > i:
ax.axis('off')
continue
elif i == j:
# diagonal part
ax.grid(True)
sigma_cov = np.sqrt(Cov[i,j])
mean_data = theta_best_Planck[i]
xarr = np.linspace(mean_data - 5*sigma_cov, mean_data + 5*sigma_cov, 200)
yarr = gaussian(xarr, mean_data, sigma_cov)
ax.plot(xarr, yarr, lw = 1)
# ax.set_xlim(...)
# ax.set_xticks(...)
# ax.set_yticklabels([])
# ax.set_xticklabels([])
else:
# off-diagonal part
ax.grid(True)
# Covariance matrix
CovM = np.array([[Cov[i,i], Cov[i,j]],
[Cov[j,i], Cov[j,j]]])
marg_sigma = np.sqrt(np.diag(CovM))
# Get eigenvalue/vector using svd
eigvec, eigval, u = np.linalg.svd(CovM)
# Get Semimajor/minor axes of the ellipse
semimaj = np.sqrt(eigval[0])*2.
semimin = np.sqrt(eigval[1])*2.
# Rotation angle of the ellipse
theta = np.arctan(eigvec[0][1]/eigvec[0][0])
# Create ellipses
ell = mpl.patches.Ellipse(xy=[theta_best_Planck[j], theta_best_Planck[i]], width=1.52*semimaj, height=1.52*semimin, angle = theta*180/np.pi, facecolor = 'dodgerblue', edgecolor = 'royalblue', label = '68% confidence')
ell2 = mpl.patches.Ellipse(xy=[theta_best_Planck[j], theta_best_Planck[i]], width=2.48*semimaj, height=2.48*semimin, angle = theta*180/np.pi, facecolor = 'skyblue', edgecolor = 'royalblue', label = '95% confidence')
ax.add_patch(ell2)
ax.add_patch(ell)
ax.set_xlim(theta_best_Planck[j] - 5*marg_sigma[0], theta_best_Planck[j] + 5*marg_sigma[0])
ax.set_ylim(theta_best_Planck[i] - 5*marg_sigma[1], theta_best_Planck[i] + 5*marg_sigma[1])
# ax.set_xticks(...)
# ax.set_yticks(...)
if j != 0:
ax.set_yticklabels([])
if i != 5:
ax.set_xticklabels([])
if j == 0 and i !=0:
ax.set_ylabel(p_tex[i], fontsize=16)
# ax.set_yticklabels(...)
[tl.set_rotation(26) for tl in ax.get_yticklabels()]
if i == 5:
ax.set_xlabel(p_tex[j], fontsize=16)
# ax.set_xticklabels(...)
[tl.set_rotation(26) for tl in ax.get_xticklabels()]
Now, assume that we have an ideal, zero-noise CMB survey with $N_l = 0$. However, we are still instrinsically limited on the number of independent modes we can measure (there are only (2l+1) of them) - $C_l = \frac{1}{2l+1}\sum_{m=-l}^{l}\langle|a_{lm}|^2\rangle$. This leads that we get an instrinsic error (called "cosmic variance") in our estimate of $C_l$. So we approximate that
$$ (\sigma_l^{EE})^2 = \frac{2}{(2l+1) \cdot f_{sky} \cdot \Delta l}(D_l^{EE})^2,\ \ (\sigma_l^{TT})^2 = \frac{2}{(2l+1) \cdot f_{sky} \cdot \Delta l}(D_l^{TT})^2,$$
$$ (\sigma_l^{TE})^2 = \frac{2}{(2l+1) \cdot f_{sky} \cdot \Delta l}\frac{(D_l^{TE})^2 + D_l^{TT}D_l^{EE}}{2} $$.
3. First compute $\sigma_l$ for this zero-noise futuristic survey. (assuming $N_l^k = 0$) Repeat Part 1 and 2. (How well does a zero-noise CMB survey constrain the cosmologial parameters?)
$f_{sky}$ is the fraction of the sky covered by the survey. Assume that $f_{sky} = 1$ for the sake of simplicity. $\Delta l$ is the size of $l$-bin. Here, we set $l_{min} = 2, l_{max} = 2000$, and we have 92 $l$-bins in this range (For $2 \leq l < 30$, the power spectra are not binned ($\Delta l = 1$), and for $30 \leq l < 2000$, they are binned, and the bin size is $\Delta l = 30$). We obtain the measured and model power spectrum in that 92 $l$-bins.
f = 1
N = len(l)
sigma_EE = np.zeros(N)
sigma_TT = np.zeros(N)
sigma_TE = np.zeros(N)
def sigma_EE__(delta, index):
return np.sqrt((2*EE_model_Planck[index]**2)/((2*l[index] + 1) * f * delta))
def sigma_TT__(delta, index):
return np.sqrt((2*TT_model_Planck[index]**2)/((2*l[index] + 1) * f * delta))
def sigma_TE__(delta, index):
return np.sqrt((TE_model_Planck[index]**2 + TT_model_Planck[index]*EE_model_Planck[index])/((2*l[index] + 1) * f * delta))
for index in range(len(l)):
if l[index] < 30:
delta = 1
else:
delta = 30
sigma_EE[index] = sigma_EE__(delta, index)
sigma_TT[index] = sigma_TT__(delta, index)
sigma_TE[index] = sigma_TE__(delta, index)
sigma = np.array([sigma_TT,sigma_EE, sigma_TE])
fisher_matrix = np.zeros((n_params, n_params))
for i in range(len(model_power_spectrum)):
derivs = derivatives[i]
for j in range(n_params):
for k in range(n_params):
fisher_matrix[j, k] += np.sum(derivs[j] * derivs[k] / sigma[i]**2)
Cov_ = np.linalg.inv(fisher_matrix)
print('1-d constraints:')
print( 'H0 = %.2f +/- %.2f' % (H0, np.sqrt(Cov_[0,0])), '\nOmega_b h^2 = %.5f +/- %.5f' % (ombh2, np.sqrt(Cov_[1,1])), '\nOmega_c h^2 = %.4f +/- %.4f' % (omch2, np.sqrt(Cov_[2,2])), '\nn_s = %.4f +/- %.4f' % (ns, np.sqrt(Cov_[3,3])), '\n10^9A_s = %.4f, +/- %.4f' % (As*1e9, np.sqrt(Cov_[4,4])*1e9), '\ntau = %.5f +/- %.5f' % (tau, np.sqrt(Cov_[5,5])) )
1-d constraints: H0 = 67.27 +/- 0.18 Omega_b h^2 = 0.02225 +/- 0.00004 Omega_c h^2 = 0.1198 +/- 0.0005 n_s = 0.9645 +/- 0.0015 10^9A_s = 2.2065, +/- 0.0098 tau = 0.07900 +/- 0.00242
# Triangle Plot (Original skeleton code by Nicholas Kern)
# Only a suggestion. You can create your own if you wish.
fig, axes = plt.subplots(6,6,figsize=(12,12))
fig.subplots_adjust(wspace=0, hspace=0)
p_tex = np.array([r'$H_0$', r'$\Omega_bh^2$',r'$\Omega_ch^2$',r'$n_s$',r'$10^9 A_s$',r'$\tau$'])
def gaussian(x, mu, sigma):
return 1 / (sigma * np.sqrt(2 * np.pi)) * np.exp(-(x - mu)**2 / (2 * sigma**2))
for i in range(6):
for j in range(6):
ax = axes[i, j]
if j > i:
ax.axis('off')
continue
elif i == j:
# diagonal part
ax.grid(True)
sigma_cov = np.sqrt(Cov[i,j])
mean_data = theta_best_Planck[i]
xarr = np.linspace(mean_data - 5*sigma_cov, mean_data + 5*sigma_cov, 200)
yarr = gaussian(xarr, mean_data, sigma_cov)
ax.plot(xarr, yarr, lw = 1)
# ax.set_xlim(...)
# ax.set_xticks(...)
# ax.set_yticklabels([])
# ax.set_xticklabels([])
else:
# off-diagonal part
ax.grid(True)
# Covariance matrix
CovM = np.array([[Cov[i,i], Cov[i,j]],
[Cov[j,i], Cov[j,j]]])
marg_sigma = np.sqrt(np.diag(CovM))
# Get eigenvalue/vector using svd
eigvec, eigval, u = np.linalg.svd(CovM)
# Get Semimajor/minor axes of the ellipse
semimaj = np.sqrt(eigval[0])*2.
semimin = np.sqrt(eigval[1])*2.
# Rotation angle of the ellipse
theta = np.arctan(eigvec[0][1]/eigvec[0][0])
# Create ellipses
ell = mpl.patches.Ellipse(xy=[theta_best_Planck[j], theta_best_Planck[i]], width=1.52*semimaj, height=1.52*semimin, angle = theta*180/np.pi, facecolor = 'dodgerblue', edgecolor = 'royalblue', label = '68% confidence')
ell2 = mpl.patches.Ellipse(xy=[theta_best_Planck[j], theta_best_Planck[i]], width=2.48*semimaj, height=2.48*semimin, angle = theta*180/np.pi, facecolor = 'skyblue', edgecolor = 'royalblue', label = '95% confidence')
ax.add_patch(ell2)
ax.add_patch(ell)
ax.set_xlim(theta_best_Planck[j] - 5*marg_sigma[0], theta_best_Planck[j] + 5*marg_sigma[0])
ax.set_ylim(theta_best_Planck[i] - 5*marg_sigma[1], theta_best_Planck[i] + 5*marg_sigma[1])
# ax.set_xticks(...)
# ax.set_yticks(...)
if j != 0:
ax.set_yticklabels([])
if i != 5:
ax.set_xticklabels([])
if j == 0 and i !=0:
ax.set_ylabel(p_tex[i], fontsize=16)
# ax.set_yticklabels(...)
[tl.set_rotation(26) for tl in ax.get_yticklabels()]
if i == 5:
ax.set_xlabel(p_tex[j], fontsize=16)
# ax.set_xticklabels(...)
[tl.set_rotation(26) for tl in ax.get_xticklabels()]
4. Combine Part 2 and Part 3 and compare. (First plot your results from Part 2 (1-d and 2-d constraints using the Planck power spectra and noise. Then, plot Part 3 results (assuming zero noise) on top with different colors. Note that your 1-d constrains in Part 3 are more sharply peaked Gaussians (with much smaller variances), so you can scale them so that its peak amplitudes match with your results from Part 2.)
# Triangle Plot (Original skeleton code by Nicholas Kern)
# Only a suggestion. You can create your own if you wish.
fig, axes = plt.subplots(6,6,figsize=(12,12))
fig.subplots_adjust(wspace=0, hspace=0)
p_tex = np.array([r'$H_0$', r'$\Omega_bh^2$',r'$\Omega_ch^2$',r'$n_s$',r'$10^9 A_s$',r'$\tau$'])
def gaussian(x, mu, sigma):
return 1 / (sigma * np.sqrt(2 * np.pi)) * np.exp(-(x - mu)**2 / (2 * sigma**2))
for i in range(6):
for j in range(6):
ax = axes[i, j]
if j > i:
ax.axis('off')
continue
elif i == j:
# diagonal part
ax.grid(True)
sigma_cov = np.sqrt(Cov[i,j])
mean_data = theta_best_Planck[i]
xarr = np.linspace(mean_data - 5*sigma_cov, mean_data + 5*sigma_cov, 200)
yarr = gaussian(xarr, mean_data, sigma_cov)
ax.plot(xarr, yarr, lw = 1)
# ax.set_xlim(...)
# ax.set_xticks(...)
# ax.set_yticklabels([])
# ax.set_xticklabels([])
else:
# off-diagonal part
ax.grid(True)
# Covariance matrix
CovM = np.array([[Cov[i,i], Cov[i,j]],
[Cov[j,i], Cov[j,j]]])
marg_sigma = np.sqrt(np.diag(CovM))
# Get eigenvalue/vector using svd
eigvec, eigval, u = np.linalg.svd(CovM)
# Get Semimajor/minor axes of the ellipse
semimaj = np.sqrt(eigval[0])*2.
semimin = np.sqrt(eigval[1])*2.
# Rotation angle of the ellipse
theta = np.arctan(eigvec[0][1]/eigvec[0][0])
# Create ellipses
ell = mpl.patches.Ellipse(xy=[theta_best_Planck[j], theta_best_Planck[i]], width=1.52*semimaj, height=1.52*semimin, angle = theta*180/np.pi, facecolor = 'dodgerblue', edgecolor = 'royalblue', label = '68% confidence')
ell2 = mpl.patches.Ellipse(xy=[theta_best_Planck[j], theta_best_Planck[i]], width=2.48*semimaj, height=2.48*semimin, angle = theta*180/np.pi, facecolor = 'skyblue', edgecolor = 'royalblue', label = '95% confidence')
ax.add_patch(ell2)
ax.add_patch(ell)
ax.set_xlim(theta_best_Planck[j] - 5*marg_sigma[0], theta_best_Planck[j] + 5*marg_sigma[0])
ax.set_ylim(theta_best_Planck[i] - 5*marg_sigma[1], theta_best_Planck[i] + 5*marg_sigma[1])
# ax.set_xticks(...)
# ax.set_yticks(...)
if j != 0:
ax.set_yticklabels([])
if i != 5:
ax.set_xticklabels([])
if j == 0 and i !=0:
ax.set_ylabel(p_tex[i], fontsize=16)
# ax.set_yticklabels(...)
[tl.set_rotation(26) for tl in ax.get_yticklabels()]
if i == 5:
ax.set_xlabel(p_tex[j], fontsize=16)
# ax.set_xticklabels(...)
[tl.set_rotation(26) for tl in ax.get_xticklabels()]
for i in range(6):
for j in range(6):
ax = axes[i, j]
if j > i:
ax.axis('off')
continue
elif i == j:
# diagonal part
ax.grid(True)
sigma_cov = np.sqrt(Cov_[i,j])
mean_data = theta_best_Planck[i]
xarr = np.linspace(mean_data - 5*sigma_cov, mean_data + 5*sigma_cov, 200)
yarr = gaussian(xarr, mean_data, sigma_cov)
ax.plot(xarr, yarr, lw = 1)
# ax.set_xlim(...)
# ax.set_xticks(...)
# ax.set_yticklabels([])
# ax.set_xticklabels([])
else:
# off-diagonal part
ax.grid(True)
# Covariance matrix
CovM = np.array([[Cov_[i,i], Cov_[i,j]],
[Cov_[j,i], Cov_[j,j]]])
marg_sigma = np.sqrt(np.diag(CovM))
# Get eigenvalue/vector using svd
eigvec, eigval, u = np.linalg.svd(CovM)
# Get Semimajor/minor axes of the ellipse
semimaj = np.sqrt(eigval[0])*2.
semimin = np.sqrt(eigval[1])*2.
# Rotation angle of the ellipse
theta = np.arctan(eigvec[0][1]/eigvec[0][0])
# Create ellipses
ell = mpl.patches.Ellipse(xy=[theta_best_Planck[j], theta_best_Planck[i]], width=1.52*semimaj, height=1.52*semimin, angle = theta*180/np.pi, facecolor = 'red', edgecolor = 'red', label = '68% confidence')
ell2 = mpl.patches.Ellipse(xy=[theta_best_Planck[j], theta_best_Planck[i]], width=2.48*semimaj, height=2.48*semimin, angle = theta*180/np.pi, facecolor = 'orange', edgecolor = 'orange', label = '95% confidence')
ax.add_patch(ell2)
ax.add_patch(ell)
# ax.set_xlim(theta_best_Planck[j] - 5*marg_sigma[0], theta_best_Planck[j] + 5*marg_sigma[0])
# ax.set_ylim(theta_best_Planck[i] - 5*marg_sigma[1], theta_best_Planck[i] + 5*marg_sigma[1])
# ax.set_xticks(...)
# ax.set_yticks(...)
if j != 0:
ax.set_yticklabels([])
if i != 5:
ax.set_xticklabels([])
if j == 0 and i !=0:
ax.set_ylabel(p_tex[i], fontsize=16)
# ax.set_yticklabels(...)
[tl.set_rotation(26) for tl in ax.get_yticklabels()]
if i == 5:
ax.set_xlabel(p_tex[j], fontsize=16)
# ax.set_xticklabels(...)
[tl.set_rotation(26) for tl in ax.get_xticklabels()]
print("Legend:\n\t Blue Planck survey (68%/98% = dark blue /lightblue)\n\t Red/orange = No noise[68%/95%]")
Legend: Blue Planck survey (68%/98% = dark blue /lightblue) Red/orange = No noise[68%/95%]
5. Starting from the best-fit values from the Planck 2015 paper, you constrained six cosmological parameters assuming that you have a zero-noise future CMB survey. Compare your results with Table 3 and Figure 6 in https://arxiv.org/pdf/1502.01589v3.pdf.
Answer:
It's clear that an optimal, zero noise survey would significantly enhance the precision of constraining cosmological parameters (it gets super peaky)! The comparison chart presented above demonstrates that the 68% confidence interval of a zero noise survey is better than the planck 95% confidence of cosmological paramters.
"Independent component analysis was originally developed to deal with problems that are closely related to the cocktail-party problem. The goal is to find a linear representation of nongaussian data so that the components are statistically independent, or as independent as possible. Such a representation seems to capture the essential structure of the data in many applications, including feature extraction and signal separation." More details on ICA can be found in https://www.cs.helsinki.fi/u/ahyvarin/papers/NN00new.pdf
In this problem, we take the mixed sounds and images, and apply ICA tn them to separate the sources.
1. Read the 3 sound files, and plot them as a function of time. (In order to better see the features, you may plot them with some offsets.)
from scipy.io import wavfile
fs, data1 = wavfile.read('/content/drive/My Drive/P188_288/P188_288_HW4/mix_sound1.wav')
fs, data2 = wavfile.read('/content/drive/My Drive/P188_288/P188_288_HW4/mix_sound2.wav')
fs, data3 = wavfile.read('/content/drive/My Drive/P188_288/P188_288_HW4/mix_sound3.wav')
#fs is the sample rate, i.e., how many data points in one second.
data = np.float64(np.c_[data1, data2, data3])
data[:,0].shape
(308700,)
plt.figure(figsize = (10,7))
time = np.linspace(0, len(data1)/fs, len(data1))
plt.plot(time, data[:,0], lw = 1)
plt.plot(time, data[:,1]+ 5e4, lw = 1)
plt.plot(time, data[:,2]+1e5, lw = 1)
plt.xlabel('Time [second]')
plt.ylabel('Sound signal + offset')
canvas_ticks(plt)
plt.show()
2. Now run the following cells and play the sounds.
from IPython.display import Audio
Audio('/content/drive/My Drive/P188_288/P188_288_HW4/mix_sound1.wav')
Audio('/content/drive/My Drive/P188_288/P188_288_HW4/mix_sound2.wav')
Audio('/content/drive/My Drive/P188_288/P188_288_HW4/mix_sound3.wav')
You can tell there are 3 signals mixed in these sounds. You can consider these sounds as recorded by 3 different recorders that have different distrance to the 3 sources. In orther words,
\begin{equation}
\bf X = A S + \mu \tag{1}
\end{equation}
where $\boldsymbol X$ is a vector of these 3 sounds, $\boldsymbol S$ is a vector of 3 signals, $\boldsymbol \mu$ is a vector of the mean of $\boldsymbol X$, and $\boldsymbol A$ is the mixing matrix.
Next, using the sklearn's fastICA module, we will separate the signals from these sounds.
(1) Define the fastICA model:
ica = FastICA(algorithm='parallel')
(2) Using "fit_transform," fit the model with the data and obtain the signals
S = ica.fit_transform(data)
3. Find the 3 signals in the sound files. Plot the signals. (Again, you may plot them with some offsets.)
from sklearn.decomposition import FastICA
ica = FastICA(algorithm='parallel')
S = ica.fit_transform(data)
/usr/local/lib/python3.10/dist-packages/sklearn/decomposition/_fastica.py:542: FutureWarning: Starting in v1.3, whiten='unit-variance' will be used by default. warnings.warn(
plt.figure(figsize = (10,7))
plt.plot(time, S[:,0], lw = 1)
plt.plot(time, S[:,1] + .02, lw = 1)
plt.plot(time, S[:,2] +.04, lw = 1)
canvas_ticks(plt)
plt.xlabel('Time [second]')
plt.ylabel('Sound signal + offset')
plt.show()
You will find the amplitude of the signals is very small. This is because fastICA whitens the data first before applying ICA, so that the covariance matrix of the signals is I. We can amplify the signals by multiplying them with a large number.
4. Now let's save the signals as wav files and play the sounds.
Amp = 1e6
wavfile.write('signal_sound1.wav', fs, np.int16(Amp*S[:,0]))
wavfile.write('signal_sound2.wav', fs, np.int16(Amp*S[:,1]))
wavfile.write('signal_sound3.wav', fs, np.int16(Amp*S[:,2]))
Play the sounds.
Audio('signal_sound1.wav')
Audio('signal_sound2.wav')
Audio('signal_sound3.wav')
Now we can reconstruct the mixed sounds with the signals. The mixing matrix is given by ica.mixing, and the mean of the data is given by ica.mean. Note that the $X$ and $S$ from equation (1) are matrices of shape (Nsignal, Nsample), but the data and the signal you get from FastICA are matrices of shape (Nsample, Nsignal).
5. Reconstruct the sounds from the source signal, and show that the reconstruct sounds is very close to the given sounds using np.allclose
#mixing matrix
A = ica.mixing_
mu = ica.mean_
X = np.dot(A,S.T).T + mu
print(np.allclose(X, data))
True
6. The ICA requires the data to be centered and whitened. The FastICA module from sklearn does the data centering and whitening automatically. Now let's disable the data preprocessing in FastICA by ica = FastICA(whiten=False), and then redo the analysis in part 3 and 4. Plot and play the sounds. Does ICA work without data preprocessing?
ica = FastICA(whiten=False)
S = ica.fit_transform(data)
plt.figure(figsize = (10,7))
plt.plot(time, S[:,0], lw = 1)
plt.plot(time, S[:,1] + 5e4, lw = 1)
plt.plot(time, S[:,2] + 1e5, lw = 1)
canvas_ticks(plt)
plt.xlabel('Time [second]')
plt.ylabel('Sound signal + offset')
plt.show()
Save the sounds.
Amp = 1
wavfile.write('signal_sound1_no_proc.wav', fs, np.int16(Amp*S[:,0]))
wavfile.write('signal_sound2_no_proc.wav', fs, np.int16(Amp*S[:,1]))
wavfile.write('signal_sound3_no_proc.wav', fs, np.int16(Amp*S[:,2]))
Play the sounds.
Audio('signal_sound1_no_proc.wav')
Audio('signal_sound2_no_proc.wav')
Audio('signal_sound3_no_proc.wav')
I think i scared my roomate... that was horrible, we need whitening!
7. The principal companent analysis (PCA) also tries to interpret the underlying structure of the data by decomposing the data into linear combinations of the principal components. Now let's apply PCA to the sounds and see if the signals are cleanly separated in the principal components. Plot the principal components, save them as wav files and play the sounds. How does it compares to Part 3 and 4?
from sklearn.decomposition import PCA
pca = PCA(n_components=3)
fit_pca = pca.fit_transform(data)
plt.figure(figsize = (10,7))
plt.plot(time, fit_pca[:,0], lw = 1)
plt.plot(time, fit_pca[:,1] + 5e4, lw = 1)
plt.plot(time, fit_pca[:,2] + 7e4, lw = 1)
canvas_ticks(plt)
plt.xlabel('Time [second]')
plt.ylabel('Sound signal + offset')
plt.show()
Save the sounds.
Amp = 1
wavfile.write('signal_sound1_PCA.wav', fs, np.int16(Amp*S[:,0]))
wavfile.write('signal_sound2_PCA.wav', fs, np.int16(Amp*S[:,1]))
wavfile.write('signal_sound3_PCA.wav', fs, np.int16(Amp*S[:,2]))
Play the sounds.
Audio('signal_sound1_PCA.wav')
Audio('signal_sound2_PCA.wav')
Audio('signal_sound2_PCA.wav')
Now let's take a look at another example. Suppose now we have some linearly mixed images, and we are going to find the original photos with ICA. (This example is from https://github.com/vishwajeet97/Cocktail-Party-Problem)
8. Load in photos, and plot them.
import matplotlib.image as mpimg
img1=mpimg.imread('/content/drive/My Drive/P188_288/P188_288_HW4/unos.jpg')
img2=mpimg.imread('/content/drive/My Drive/P188_288/P188_288_HW4/dos.jpg')
img3=mpimg.imread('/content/drive/My Drive/P188_288/P188_288_HW4/tres.jpg')
images_ = np.array([img1, img2, img3])
fig, axs = plt.subplots(1, 3, figsize=(15, 5))
for i, ax in enumerate(axs):
ax.imshow(images_[i], cmap = 'gray')
ax.axis('off')
plt.tight_layout()
plt.show()
The image is a 2D array. You will need to flatten the data for the following analysis.
9. Redo the analysis in part 3 and 4. Separate the original photos and plot them. Note that the sign of the signals recovered by ICA may not be correct, so you probably need to multiply the photos by -1.
images = np.array([img1.flatten(), img2.flatten(),img3.flatten()]).T
ica = FastICA(algorithm='parallel')
S = ica.fit_transform(images)
/usr/local/lib/python3.10/dist-packages/sklearn/decomposition/_fastica.py:542: FutureWarning: Starting in v1.3, whiten='unit-variance' will be used by default. warnings.warn(
S[:,0].reshape((200,200)).shape
(200, 200)
fig, axs = plt.subplots(1, 3, figsize=(15, 5))
for i, ax in enumerate(axs):
imag = S[:,i].reshape((200,200))
if i == 1:
ax.imshow(-imag, cmap = 'gray')
else:
ax.imshow(-imag, cmap = 'gray')
ax.axis('off')
plt.tight_layout()
plt.show()
Plot the original photos.
images_ = np.array([img1, img2, img3])
fig, axs = plt.subplots(1, 3, figsize=(15, 5))
for i, ax in enumerate(axs):
ax.imshow(images_[i], cmap = 'gray')
ax.axis('off')
plt.tight_layout()
plt.show()
ICA algorithm tries to find the most non-Gaussian directions of given data. FastICA uses the KL divergence between the data and standard Gaussian (negentropy) to charactrize the non-Gaussianity. Another way to measure non-Gaussianity is to use the Wasserstein distance between the data and standard Gaussian. In 1D, the Wasserstein distance, also called earth mover's distance, has a closed form solution using Cumulative Distribution Function (CDF). Below we provide you the code for doing ICA using Wasserstein distance. The code searches for the most non-Gaussian directions by maximizing the Wasserstein distance between the data and Gaussian.
10. Perform ICA on the mixed photos from Q9 (do the following steps)
import torch
import torch.optim as optim
def Percentile(input, percentiles):
"""
Find the percentiles of a tensor along the last dimension.
Adapted from https://github.com/aliutkus/torchpercentile/blob/master/torchpercentile/percentile.py
"""
percentiles = percentiles.double()
in_sorted, in_argsort = torch.sort(input, dim=-1)
positions = percentiles * (input.shape[-1]-1) / 100
floored = torch.floor(positions)
ceiled = floored + 1
ceiled[ceiled > input.shape[-1] - 1] = input.shape[-1] - 1
weight_ceiled = positions-floored
weight_floored = 1.0 - weight_ceiled
d0 = in_sorted[..., floored.long()] * weight_floored
d1 = in_sorted[..., ceiled.long()] * weight_ceiled
result = d0+d1
return result
def SlicedWassersteinDistanceG(x, pg, q, p, perdim=True):
if q is None:
px = torch.sort(x, dim=-1)[0]
else:
px = Percentile(x, q)
if perdim:
WD = torch.mean(torch.abs(px-pg) ** p)
else:
WD = torch.mean(torch.abs(px-pg) ** p, dim=-1)
return WD
def SWD_prepare(Npercentile=100, device=torch.device("cuda:0"), gaussian=True):
start = 50 / Npercentile
end = 100-start
q = torch.linspace(start, end, Npercentile, device=device)
if gaussian:
pg = 2**0.5 * torch.erfinv(2*q/100-1)
return q, pg
def maxSWDdirection(x, x2='gaussian', n_component=None, maxiter=200, Npercentile=None, p=2, eps=1e-6):
#if x2 is None, find the direction of max sliced Wasserstein distance between x and gaussian
#if x2 is not None, find the direction of max sliced Wasserstein distance between x and x2
if x2 != 'gaussian':
assert x.shape[1] == x2.shape[1]
if x2.shape[0] > x.shape[0]:
x2 = x2[torch.randperm(x2.shape[0])][:x.shape[0]]
elif x2.shape[0] < x.shape[0]:
x = x[torch.randperm(x.shape[0])][:x2.shape[0]]
ndim = x.shape[1]
if n_component is None:
n_component = ndim
q = None
if x2 == 'gaussian':
if Npercentile is None:
q, pg = SWD_prepare(len(x), device=x.device)
q = None
else:
q, pg = SWD_prepare(Npercentile, device=x.device)
elif Npercentile is not None:
q = SWD_prepare(Npercentile, device=x.device, gaussian=False)
#initialize w. algorithm from https://arxiv.org/pdf/math-ph/0609050.pdf
wi = torch.randn(ndim, n_component, device=x.device)
Q, R = torch.qr(wi)
L = torch.sign(torch.diag(R))
w = (Q * L).T
lr = 0.1
down_fac = 0.5
up_fac = 1.5
c = 0.5
#algorithm from http://noodle.med.yale.edu/~hdtag/notes/steifel_notes.pdf
#note that here w = X.T
#use backtracking line search
w1 = w.clone()
w.requires_grad_(True)
if x2 == 'gaussian':
loss = -SlicedWassersteinDistanceG(w @ x.T, pg, q, p)
else:
loss = -SlicedWassersteinDistance(w @ x.T, w @ x2.T, q, p)
loss1 = loss
for i in range(maxiter):
GT = torch.autograd.grad(loss, w)[0]
w.requires_grad_(False)
WT = w.T @ GT - GT.T @ w
e = - w @ WT #dw/dlr
m = torch.sum(GT * e) #dloss/dlr
lr /= down_fac
while loss1 > loss + c*m*lr:
lr *= down_fac
if 2*n_component < ndim:
UT = torch.cat((GT, w), dim=0).double()
V = torch.cat((w.T, -GT.T), dim=1).double()
w1 = (w.double() - lr * w.double() @ V @ torch.pinverse(torch.eye(2*n_component, dtype=torch.double, device=x.device)+lr/2*UT@V) @ UT).to(torch.get_default_dtype())
else:
w1 = (w.double() @ (torch.eye(ndim, dtype=torch.double, device=x.device)-lr/2*WT.double()) @ torch.pinverse(torch.eye(ndim, dtype=torch.double, device=x.device)+lr/2*WT.double())).to(torch.get_default_dtype())
w1.requires_grad_(True)
if x2 == 'gaussian':
loss1 = -SlicedWassersteinDistanceG(w1 @ x.T, pg, q, p)
else:
loss1 = -SlicedWassersteinDistance(w1 @ x.T, w1 @ x2.T, q, p)
if torch.max(torch.abs(w1-w)) < eps:
w = w1
break
lr *= up_fac
loss = loss1
w = w1
if x2 == 'gaussian':
WD = SlicedWassersteinDistanceG(w @ x.T, pg, q, p, perdim=False)
else:
WD = SlicedWassersteinDistance(w @ x.T, w @ x2.T, q, p, perdim=False)
return w.T, WD**(1/p)
def ICA_Wasserstein(x):
A, WD = maxSWDdirection(torch.tensor(x, dtype=torch.float32))
return A.detach().numpy()
#load the data
# again?
img1=mpimg.imread('/content/drive/My Drive/P188_288/P188_288_HW4/unos.jpg')
img2=mpimg.imread('/content/drive/My Drive/P188_288/P188_288_HW4/dos.jpg')
img3=mpimg.imread('/content/drive/My Drive/P188_288/P188_288_HW4/tres.jpg')
images = np.array([img1.flatten(), img2.flatten(),img3.flatten()]).T
#Whiten the data
pca = PCA(whiten=True, n_components = 3)
white_img =pca.fit_transform(images)
#ICA
A = ICA_Wasserstein(white_img)
#Recover S
A_inv = np.linalg.inv(A)
S_recovered = np.dot(A_inv ,white_img.T)
fig, axs = plt.subplots(1, 3, figsize=(15, 5))
for i, ax in enumerate(axs):
imag = S_recovered[i].reshape((200,200))
if i == 2:
ax.imshow(imag, cmap = 'gray')
else:
ax.imshow(-imag, cmap = 'gray')
ax.axis('off')
plt.tight_layout()
plt.show()
Plot the original photos.
images_ = np.array([img1, img2, img3])
fig, axs = plt.subplots(1, 3, figsize=(15, 5))
for i, ax in enumerate(axs):
ax.imshow(images_[i], cmap = 'gray')
ax.axis('off')
plt.tight_layout()
plt.show()
!jupyter nbconvert --to html "/content/drive/My Drive/P188_288/P188_288_HW4/HW4_188.ipynb"
[NbConvertApp] Converting notebook /content/drive/My Drive/P188_288/P188_288_HW4/HW4_188.ipynb to html [NbConvertApp] Writing 14776027 bytes to /content/drive/My Drive/P188_288/P188_288_HW4/HW4_188.html